{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tune hyperparameters for your RL training job\n",
    "This notebook shows how to use SageMaker's Automatic Model Tuning functionality to optimize the training of an RL model, using the Roboschool environment.  Note that the bayesian hyperparameter optimization algorithm used in SageMaker Automatic Model Tuning expects a stable objective function, which means that running the same training job multiple times with the same configuration should give about the same result.  Some RL training processes are highly non-deterministic, such that the same configuration sometimes performs well and sometimes fails to train.  These environments will not work well with the automatic model tuner.  However, the Roboschool environments are fairly stable and tend to perform similarly given the same hyperparameters.  So the hyperparameter tuner should work well here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pick which Roboschool problem to solve\n",
    "\n",
    "Roboschool is an [open source](https://github.com/openai/roboschool/tree/master/roboschool) physics simulator that is commonly used to train RL policies for robotic systems.  Roboschool defines a [variety](https://github.com/openai/roboschool/blob/master/roboschool/__init__.py) of Gym environments that correspond to different robotics problems.  Here we're highlighting a few of them at varying levels of difficulty:\n",
    "\n",
    "- **Reacher (easy)** - a very simple robot with just 2 joints reaches for a target\n",
    "- **Hopper (medium)** - a simple robot with one leg and a foot learns to hop down a track \n",
    "- **Humanoid (difficult)** - a complex 3D robot with two arms, two legs, etc. learns to balance without falling over and then to run on a track\n",
    "\n",
    "The simpler problems train faster with less computational resources.  The more complex problems are more fun."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment the problem to work on\n",
    "#roboschool_problem = 'reacher'\n",
    "roboschool_problem = 'hopper'\n",
    "#roboschool_problem = 'humanoid'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-requisites \n",
    "\n",
    "### Imports\n",
    "\n",
    "To get started, we'll import the Python libraries we need, set up the environment with a few prerequisites for permissions and configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import sagemaker\n",
    "import boto3\n",
    "import sys\n",
    "import os\n",
    "import glob\n",
    "import re\n",
    "import subprocess\n",
    "from IPython.display import HTML\n",
    "import time\n",
    "from time import gmtime, strftime\n",
    "sys.path.append(\"common\")\n",
    "from misc import get_execution_role, wait_for_s3_object\n",
    "from docker_utils import build_and_push_docker_image\n",
    "from sagemaker.rl import RLEstimator, RLToolkit, RLFramework"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup S3 bucket\n",
    "\n",
    "Set up the linkage and authentication to the S3 bucket that you want to use for checkpoint and the metadata. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sage_session = sagemaker.session.Session()\n",
    "s3_bucket = sage_session.default_bucket()  \n",
    "s3_output_path = 's3://{}/'.format(s3_bucket)\n",
    "print(\"S3 bucket path: {}\".format(s3_output_path))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Variables \n",
    "\n",
    "We define variables such as the job prefix for the training jobs *and the image path for the container (only when this is BYOC).*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a descriptive job name \n",
    "job_name_prefix = 'tune-'+roboschool_problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure resources to be used for tuning\n",
    "\n",
    "Tuning jobs need to happen in hosted SageMaker, not local mode.  So we pick the instance type.  Also we need to specify how many training jobs and how quickly to run them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note that Tuning cannot happen in local mode.\n",
    "instance_type = \"ml.m4.xlarge\"\n",
    "\n",
    "# Pick the total number of training jobs to run in this tuning job\n",
    "max_jobs = 50\n",
    "\n",
    "# How many jobs should run at a time.  Higher numbers here mean the tuning job runs much faster,\n",
    "# while lower numbers can sometimes get better results\n",
    "max_parallel_jobs = 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create an IAM role\n",
    "\n",
    "Either get the execution role when running from a SageMaker notebook instance `role = sagemaker.get_execution_role()` or, when running from local notebook instance, use utils method `role = get_execution_role()` to create an execution role."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    role = sagemaker.get_execution_role()\n",
    "except:\n",
    "    role = get_execution_role()\n",
    "\n",
    "print(\"Using IAM role arn: {}\".format(role))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build docker container\n",
    "\n",
    "We must build a custom docker container with Roboschool installed.  This takes care of everything:\n",
    "\n",
    "1. Fetching base container image\n",
    "2. Installing Roboschool and its dependencies\n",
    "3. Uploading the new container image to ECR\n",
    "\n",
    "This step can take a long time if you are running on a machine with a slow internet connection.  If your notebook instance is in SageMaker or EC2 it should take 3-10 minutes depending on the instance type.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "\n",
    "cpu_or_gpu = 'gpu' if instance_type.startswith('ml.p') else 'cpu'\n",
    "repository_short_name = \"sagemaker-roboschool-ray-%s\" % cpu_or_gpu\n",
    "docker_build_args = {\n",
    "    'CPU_OR_GPU': cpu_or_gpu, \n",
    "    'AWS_REGION': boto3.Session().region_name,\n",
    "}\n",
    "custom_image_name = build_and_push_docker_image(repository_short_name, build_args=docker_build_args)\n",
    "print(\"Using ECR image %s\" % custom_image_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure Tuning\n",
    "\n",
    "Tuning jobs need to know what to vary when looking for a great configuration.  Sometimes this is called the search space, or the configuration space.  We do this by picking a set of hyperparameters to vary, and specify the ranges for each of them.  Unless you're experienced at automatically tuning hyperparameters, you should probably start with just one or two hyperparameters at a time to see how they effect the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner\n",
    "\n",
    "# The hyperparameters we're going to tune\n",
    "hyperparameter_ranges = {\n",
    "    # inspired by https://medium.com/aureliantactics/ppo-hyperparameters-and-ranges-6fc2d29bccbe\n",
    "    #'rl.training.config.clip_param': ContinuousParameter(0.1, 0.4),\n",
    "    #'rl.training.config.kl_target': ContinuousParameter(0.003, 0.03),\n",
    "    #'rl.training.config.vf_loss_coeff': ContinuousParameter(0.5, 1.0),\n",
    "    #'rl.training.config.entropy_coeff': ContinuousParameter(0.0, 0.01),\n",
    "    'rl.training.config.kl_coeff': ContinuousParameter(0.5, 1.0),\n",
    "    'rl.training.config.num_sgd_iter': IntegerParameter(3, 50),\n",
    "}\n",
    "\n",
    "# The hyperparameters that are the same for all jobs\n",
    "static_hyperparameters = {\n",
    "    \"rl.training.stop.time_total_s\": 600,  # Tell each training job to stop after 10 minutes\n",
    "    #'rl.training.config.num_sgd_iter': 7,\n",
    "    #'rl.training.config.sgd_minibatch_size': 1000,\n",
    "    #'rl.training.config.train_batch_size': 25000,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prepare to launch the tuning job.\n",
    "\n",
    "First we create an estimator like we would if we were launching a single training job.  This will be used to create the `tuner` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "metric_definitions = RLEstimator.default_metric_definitions(RLToolkit.RAY)\n",
    "estimator = RLEstimator(entry_point=\"train-%s.py\" % roboschool_problem,\n",
    "                        source_dir='src',\n",
    "                        dependencies=[\"common/sagemaker_rl\"],\n",
    "                        image_name=custom_image_name,\n",
    "                        role=role,\n",
    "                        train_instance_type=instance_type,\n",
    "                        train_instance_count=1,\n",
    "                        output_path=s3_output_path,\n",
    "                        base_job_name=job_name_prefix,\n",
    "                        metric_definitions=metric_definitions,\n",
    "                        hyperparameters=static_hyperparameters,\n",
    "                    )\n",
    "\n",
    "tuner = HyperparameterTuner(estimator,\n",
    "                            objective_metric_name='episode_reward_mean',\n",
    "                            objective_type='Maximize',\n",
    "                            hyperparameter_ranges=hyperparameter_ranges,\n",
    "                            metric_definitions=metric_definitions,\n",
    "                            max_jobs=max_jobs,\n",
    "                            max_parallel_jobs=max_parallel_jobs,\n",
    "                            base_tuning_job_name=job_name_prefix,\n",
    "                           )\n",
    "tuner.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Monitor progress\n",
    "\n",
    "To see how your tuning job is doing, jump over to the SageMaker console.  Under the **Training** section, you'll see Hyperparameter tuning jobs, where you'll see the newly created job.  It will launch a series of TrainingJobs in your account, each of which will behave like a regular training job.  They will show up in the list, and when each job is completed, you'll see the final value they achieved for mean reward.  Each job will also emit algorithm metrics to cloudwatch, which you can see plotted in CloudWatch metrics.  To see these, click on the training job to det to its detail page, and then look for the link \"View algorithm metrics\" which will let you see a chart of how that job is progressing.  By changing the search criteria in the CloudWatch console, you can overlay the metrics for all the jobs in this tuning job."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  },
  "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
